50        Bioinformatics

together with their gene annotations, which can be used in the process of read alignment/

mapping to act as guides on which new genomes are assembled fast. A reference genome

of an organism is a curated sequence that is built up using the DNA information of several

normal individuals of that organism. The reference genome curation was pioneered by

the Genome Reference Consortium (GRC), which is founded in 2008 as a collaboration of

the National Center for Biotechnology Information (NCBI), the European Bioinformatics

Institute (EBI), the McDonnell Genome Institute (MGI), and the Wellcome Sanger Institute

to maintain and update the human and mouse genome reference assemblies. Now, GRC

maintains the human, mouse, zebrafish, rat, and chicken reference genomes. Reference

genomes of other organisms are curated by specialized institutions including NCBI and

many others, which manually select genome assemblies that are identified as standard or

representative sequences (RefSeq) against which data of the individuals from those organ-

isms can be compared. All eukaryotes have a single reference genome per species, but pro-

karyotes may have multiple reference genome sequences for a species. The NCBI curates

reference genomes from the assemblies categorized as RefSeq on the GenBank database. If

a eukaryotic species has no assemblies in the RefSeq, then the best GenBank assembly for

that species is selected as a representative genome. Viruses as well may have more than one

reference genomes per species. Generally, the update of a reference genome of any species is

a continuous process and a new version, usually called “Build”, may be released whenever

new information emerges. A release of a reference genome may be accompanied by gene

annotations. A well-curated reference genome, like human and other model organisms’

reference genomes, is usually released with annotation information such as gene anno-

tation and variant annotation. Reference genomes are made available at the NCBI web-

site in both FASTA file format and GenBank file format. Several annotation files may be

FIGURE 2.1  Human reference genome on the NCBI Genome page.